feat(bonsai): compact-output mode + persistent KV cache#169
Merged
KailasMahavarkar merged 1 commit intomainfrom Apr 20, 2026
Merged
feat(bonsai): compact-output mode + persistent KV cache#169KailasMahavarkar merged 1 commit intomainfrom
KailasMahavarkar merged 1 commit intomainfrom
Conversation
…-agnostic)
Two CPU-agnostic speed levers on top of BonsaiIngestor. Both produce
measurable wall-clock wins without any machine-specific tuning.
1. Compact-output mode (compact=True)
LLM emits ~30 tokens of ENTS/BELIEFS/RETRACTS instead of ~150 tokens
of full DSL. Python synthesizes the DSL deterministically.
- New skill: tools/skills/graphstore-bonsai-dsl-compact/SKILL.md
(~620 source tokens, clear rules + negative examples so model does
NOT promote third-person observations to first-person beliefs).
- Parser: _parse_compact_output(cleaned) -> CompactTurn
- Templates: _synthesize_dsl(turn, msg_id=..., session_id=..., role=..., text=...)
Deterministic DSL builder. Same input always produces same output.
Wins (4B TQ1_0, CPU only):
warm avg: 3.9s -> 1.7s (2.3x faster)
cold: 19.6s -> 10.1s (1.9x faster)
5-msg: 35.3s -> 16.9s (2.1x faster)
raw out: 335B -> 98B (3.4x smaller)
Quality on T1-T5 smoke: all 5 messages ingested, 1 correct belief
(fact:favorite_color=green), zero spurious beliefs, fact_id reuse
working across the correction test.
API shape:
ing = BonsaiIngestor(model_path=..., compact=True, kv_cache_path=...)
ing.ingest(text, msg_id="m:s1:0", session_id="s1") # msg_id required
# compact=False keeps the full-DSL path unchanged (backward compat)
2. Persistent KV cache (kv_cache_path=...)
llama.cpp's save_state/load_state pickled to disk. Eliminates the
~10s cold penalty on every process restart (serverless, CLI
one-shots, dev iteration).
Workflow:
run 1: ing = BonsaiIngestor(..., kv_cache_path="/tmp/bonsai_kv.bin")
ing.warmup()
ing.save_kv_cache()
run 2: ing = BonsaiIngestor(..., kv_cache_path="/tmp/bonsai_kv.bin")
ing.ingest("...") # <- 2.0s instead of ~10s cold
Safety: the cache file stores a meta dict alongside the state -
skill fingerprint, model path+size, n_ctx, chat_format. On load,
stale meta (different skill, different model, different context
size) is rejected and the process warms fresh. Corrupt pickle or
wrong-shape payloads also silently fall back to fresh warm.
Disk cost: ~406 MB for n_ctx=2048 on 4B TQ1_0. Opt-in; no-op when
kv_cache_path is not set.
Skipped: `-march=native` llama.cpp rebuild. That would only optimize
this machine and could ship binaries that crash on CPUs without the
same ISA. Kept portable instead.
Tests: +23 unit tests (48 total on bonsai_ingestor)
- compact parser: all-sections / none / missing / case-insensitive /
fence tolerance / escaped quotes / unknown-prefix ignore
- dsl synthesis: min-turn / entities + matching edges / dedupe /
belief + retract pair / quote escaping / overall kind order
- KV cache: no-op without path / no-op without llm / missing file /
stale meta rejection / corrupt pickle / wrong shape / meta shape
Full suite: 1850 passed, 101 skipped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two generic speed levers on top of BonsaiIngestor
Skipped
-march=nativerebuild - that's machine-specific and would ship binaries that crash on CPUs missing the ISA. Kept both levers CPU-agnostic.1. Compact-output mode (
compact=True)LLM emits ~30 tokens of ENTS/BELIEFS/RETRACTS tags instead of ~150 tokens of full DSL. Python synthesizes DSL deterministically.
Wins (4B TQ1_0, CPU only):
Quality on T1-T5: all 5 messages ingested correctly, 1 correct belief (
fact:favorite_color=green), zero spurious beliefs from third-person observations (skill has explicit "I/my/me only" rule + negative example).2. Persistent KV cache (
kv_cache_path=...)llama.cpp's save/load_state pickled to disk with a safety meta-dict (skill fingerprint + model size + n_ctx + chat_format). Stale/corrupt cache silently falls back to fresh warm.
Wins for cold-boot scenarios (serverless, CLI, dev iteration):
Disk cost: ~406 MB for n_ctx=2048. Opt-in.
API
compact=Falsekeeps the full-DSL path unchanged (backward compat).Tests
+23 unit tests (48 total on bonsai_ingestor): compact parser, dsl synthesis, KV cache load/save/stale/corrupt. Full suite: 1850 passed, 101 skipped.
🤖 Generated with Claude Code